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ABSTRACT 

Many  authors  have  recently  modeled  learning  by  Bayesian  decision-makers  and  by  appeal  to  the 
Martingale  Convergence  Theorem  proven  that  posterior  beliefs  converge  a.s.  from  the  point  of  view  of  the 
decision-maker.  This  paper  emphasizes  that  this  is  not  equivalent  to  a.s.  convergence  from  the  point  of 
view  of  an  observer  who  knows  the  true  parameter  value.  For  a  variant  of  die  two-armed  bandit  problem  it 
is  demontrated  that  for  a  residual  family  of  parameter  values  and  priors  with  objective  probability  one 
posterior  beliefs  of  the  Bayesian  decision-maker  never  converge.  JEL  Classification  Numbers:  022,  026, 
211. 
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I.  INTRODUCTION 

Recently,  there  has  been  a  revival  of  interest  in  studying  the  asymptotic  dynamics  of  Bayesian  learning 
and  control  in  economic  environments.  One  set  of  authors,  including  Easley  and  Kiefcr  [9,  10],  Kiefer  and 
Nyarko  [16],  McLennan  [18],  and  Nyarko  [19],  have  analyzed  single  agent  decision  problems  for  which 
there  is  a  tradeoff  between  current  period  reward  and  the  value  of  the  information  generated  by  the  current 
period  action.  In  another  strand  of  the  literature,  Blume  and  Easley  [5],  Bray  and  Kreps  [6],  and  Feldman 
[11,  12]  attempt  to  characterize  the  tail  of  the  sequence  of  beliefs  and  outcomes  for  economies  with  many 
passively  learning  agents.  Specifically,  these  latter  papers  focus  on  whether  Bayesian  learning  by  agents 
with  a  correct  specification  of  the  underlying  structure  but  uncertainty  regarding  the  parameter  values  is  a 
sufficient  condition  to  assure  convergence  to  a  stationary  rational  expectations  equilibrium. 

These  articles,  as  well  as  important  earlier  contributions  of  Cyert  and  DeGroot  [6],  Rothschild  [23],  and 
Townsend  [26],  have  the  following  common  framework.  From  the  vantage  point  of  the  economic  actors, 
the  set  of  possible  complete  descriptions  of  the  relevant  time-invariant  economic  data  can  be  represented  as  a 
separable  metric  space  0  with  Borel  a-field  B.  The  actors  in  the  model,  uncertain  as  to  the  "true"  8°  e  0, 
have  prior  beliefs  |i(0)  on  (S,B).1  A  result  common  to  this  literature  is  a  "theorem"  that  the  sequence 
(H(t)}  of  posterior  beliefs  almost  surely  converges  weakly  to  |i(°°),  the  posterior  belief  conditioned  on  the 
limit  sub-<y-field. 

In  all  of  the  more  recently  authored  papers  the  Martingale  Convergence  Theorem  is  invoked  to  establish 
a.s.  convergence  of  the  sequence  of  Bayesian  beliefs,  where  the  a.s.  is  with  respect  to  the  prior  probability 
measures  of  the  Bayesian  agents.  An  essential  point  of  this  paper  is  that  this  is  not  necessarily  equivalent  to 
stating  that  a  passive  observer  who  is  informed  of  the  "true"  parameter  9°  would  attach  probability  one  to 
the  event  that  the  sequence  of  beliefs  {n(t)}  of  the  Bayesian  agents  would  converge  to  a  limit  belief  |i(°°). 

To  elaborate  on  this  distinction,  suppose  that  in  period  t  =  0,  1,  ...  ,  agents  observe  outcomes  in  a 
separable  metric  space  Y.  Given  the  behavioral  rules  of  the  agents,  each  parameter  value  9  e  0  induces  a 
probability  measure  Qq  on  the  product  space  Y°°.  The  prior  p.(0)  induces  a  measure  Pfi(0)  on  Y°°  defined  by 
P|i(0)(A)  =     Qe(A)PjI(0)(d9).  The  application  of  the  Martingale  Convergence  Theorem  yields  a.s. 

convergence  of  (|i(t)},  where  the  a.s.  statement  is  with  respect  to  the  probability  measure  Pjj.(0)-  But  this 
does  not  imply  that  for  any  particular  9  that  |i(t)  =>  |i°°  with  Qq  probability  one,  even  if  9  is  in  the  support 
ofn(0). 

One  might  hope  to  establish  a  result  that  for  a  "large"  class  of  priors,  posteriors  converge  for  a  "large" 
class  of  parameter  values.  When  0  can  be  embedded  in  finite-dimensional  Euclidean  space,  one  has  recourse 
to  Lebesgue  measure  m  (restricted  to  0)  as  a  natural  notion  of  size.  Then  if  m  «  |i(0)  the  exceptional  9 
set,  the  set  of  9  such  that  Qe((p.(t)  =>  n°°})  <  1,  has  m  measure  zero.  But  in  many  naturally  occurring 

Some  authors  allow  for  heterogeneous  beliefs  across  agents. 
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settings  0  is  not  finite  dimensional,  and  since  there  is  no  infinite  dimensional  analogue  of  Lebesgue 
measure,  a  measure-theoretic  criterion  is  unavailable. 

In  lieu  of  a  reference  measure  to  evaluate  size,  the  customary  procedure  is  to  resort  to  the  topological 
notion  of  category.  Residual  subsets  are  deemed  to  be  large,  and  subsets  of  first  category  (which  are 
complements  of  residual  subsets)  are  regarded  as  small. 

The  formal  analysis  in  this  paper  focuses  on  the  two-armed  bandit  problem  with  a  countable  number  of 
potential  outcomes  associated  with  each  arm.  Building  upon  results  of  Freedman  [13]  we  prove  that  mere  is 
a  residual  set  of  pairs  (0,  \i)  of  parameter  values  and  prior  beliefs  such  that  with  Qq  probability  one 
posterior  beliefs  never  converge.  Furthermore,  there  exist  reward  structures  with  one  arm  more  favorable 
than  the  other  for  an  informed  decision-maker;  yet  for  a  residual  set  of  pairs  (9,  \x)  both  arms  will  be  played 
infinitely  often  with  Qq  probability  one. 

II.  NOTATION  AND  MATHEMATICAL  PRELIMINARIES 
II.  1.  Notational  Conventions 

The  set  of  real  numbers  is  denoted  by  R.  The  set  of  natural  numbers  is  denoted  by  N  =  { 1,  2, ... }.  A 
generic  element  of  N  is  denoted  by  either  i,  n  or  L  The  set  of  non-negative  integers  is  denoted  by  No  =  {0, 
1,2,}.  If  Z  is  an  arbitrary  set  and  A  c  Z,  the  indicator  function  for  the  set  A  is  denoted  by  Ia  •  If  W  is  a 
topological  space,  then  C(W)  is  the  Banach  space  of  bounded,  continuous  real- valued  functions  defined  on 
W  with  the  sup  norm.  P(W)  is  the  space  of  probability  measures  on  W  endowed  with  the  topology  of  weak 
convergence.  P(P(W))  is  denoted  by  P2(W).  Unless  otherwise  specified  all  a-fields  are  the  Borel  a-field.  If 
(W,W0  is  a  measurable  space  and  w  e  W  the  Dirac  measure  8W  e  f(W)  is  defined  by  8W(A)  =  1  if  w  e  A 
and  5W(A)  =  0  if  w  e  A,  for  A  e  W. 

II. 2.  Category  Theory 

I  shall  briefly  review  the  concepts  relating  to  Baire  category.  Standard  references  include  Kelley  [17,  pp. 
200-203],  Oxtoby  [20]  and  Royden  [24,  Section  7.8].  Let  W  be  a  metric  space.  A  set  E  c  W  is  nowhere 

dense  if  E  has  empty  interior.  A  set  E  is  of  first  category  or  meager  if  it  is  the  union  of  a  countable 
collection  of  nowhere  dense  sets.  If  a  set  is  not  of  first  category  then  it  is  of  second  category.  The 
complement  of  a  set  of  first  category  is  a  residual  set. 

According  to  the  theorem  of  Baire  [24,  Theorem  7.27],  if  W  is  a  complete  metric  space  then  the 
intersection  of  a  countable  family  of  open  dense  subsets  of  W  is  itself  a  dense  subset  of  W. 

III.  THE  MODEL 
III.l.  Preliminary  Description 

There  are  a  countable  number  of  time  periods,  with  periods  indexed  by  t  =  0,  1,  2,  ....  At  the 
beginning  of  period  t  the  Bayesian  decision-maker  selects  an  action  (or  bandit-arm)  x(t)  e  X  =  [x\,  X2). 
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After  choosing  action  Xk,  an  outcome  y(t)  e  Yk  =  (yki,  yk2» ...}  is  observed,  and  a  period  reward  Rk(y(0) 
is  received.  The  range  of  Rk  is  bounded  with  cfc  =  inf(Rk(yki):  yki  e  Yk}. 

The  action  x(t)  and  outcome  y(t)  are  the  realizations  of  the  random  variables  X(t)  and  Y(t).  The 
probability  law  specifying  the  distribution  of  Y(t)  conditional  upon  X(t)  =  xk  is  9k  e  /'(Yk),  where  Yk  is 
endowed  with  the  discrete  topology.  The  objective  probabilities  0j  and  62  are  respectively  elements  of  the 
parameter  subspaces  ©1  =  P(Yi)  and  ©2  =  P(Y2),  and  the  vector  9°  =  (8j,  82)  is  an  element  of  the 
parameter  space  0  =  @i  x  ©2.  9^  is  initially  unknown  to  the  decision-maker,  and  so  is  viewed  from  her 
perspective  as  the  realization  of  a  random  element  with  distribution  n(0)  e  P(&).  (When  it  is  clear  from  the 
context  that  n  represents  a  prior  belief,  we  sometimes  write  |i  instead  of  |i(0).) 

The  objective  of  the  decision-maker  is  to  maximize,  with  discount  factor  a  e  (0,  1),  the  expectation  of 

the  present  value  of  the  reward  stream.  Denoting  for  notational  convenience  the  indicator  function 
IXk:  X  — »  {0,  1 }  by  Ik,  for  a  given  sequence  {x(t)j  of  actions  and  a  given  sequence  (y(t)}  of  outcomes,  the 

00  2 

total  reward  to  die  decision-maker  is  2^al-{  £lk(x(0)Rk(y(0))-  So  the  optimization  problem  to  choose  a 

t=0       k=1 

policy    (X(t)}    (a  precise  definition  of  a  policy  is  provided  in  section  III. 3)   that  maximizes 

00  2 

E[^al{  £lk(X(f))Rk(Yk(t))}]  where  the  expectation  is  taken  with  respect  to  the  probability  measure 
t=0       k=l 

induced  by  her  prior  belief  fi(0). 

The  policy  or  "rule"  that  the  decision-maker  abides  by  in  choosing  the  sequence  (x(f)}  depends  upon  a 

and  |i(0).  While  a  remains  fixed  throughout  the  paper,  our  goal  is  to  study  the  sensitivity  of  the  tail 

behavior  to  variation  in  the  prior  beliefs.  To  emphasize  this  dependence  of  actions  upon  the  prior  belief 

^i(O),  we  will  often  write  X(t,  ^i(O))  or  X(t,  n)  and  Y(t,  |i(0))  or  Y(t,  |i)  for  respectively  X(t)  and  Y(t). 

III.2.  Specification  of  the  Probability  Spaces 

Endowed  with  the  Prohorov  metric,  ©  is  a  complete  and  separable  metric  space.  The  Borel  subsets  of 
©1,  ©2,  and  ©  are  denoted  respectively  by  B],  B2,  and  B. 

The  set  of  possible  outcomes  from  repeated  play  of  arm  k  is  Yk°°.  Endowing  Yk°°  with  the  product 
topology,  the  product  a-field  is  denoted  byfy00 '.  To  facilitate  the  analysis  we  elect,  without  loss  of 
generality,  to  work  in  representation  space  (for  further  details  see  [8,  Section  1.6]).  Accordingly,  we  define 
the  measurable  spaces  (Q\,  Fj),  (^2.  ^2)  and  (Q,  F)  by  (Q^  Fk)  =  (Yk°°,  Yk°)  and  (Q,  F)  = 
(^1  x  &2.  F 2  x  P2)-  A  generic  element  of  Qk  is  denoted  as  cok  =  (u)ki,  0)k2.  •••)•  Associated  with  each 
9  =  (9i,e  ©,  is  an  "objective"  product  probability  measure  Qe  =  Qej  x  Qq2  on  (Q,  F)  defined  by  Qek= 

9k  x  9k  x  ...  .  The  interpretation  is  that:  (i)  cokn  is  the  element  y^i  e  Yk  that  is  observed  on  the  n'th 
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occasion  that  action  xk  is  chosen,  and  (ii)  that  conditional  upon  parameter  8k,  the  outcomes 
{(Okl,  0)^2.  •••}  fr°m  choosing  xk  are  i.i.d.  with  distribution  8k.2 

The  product  outcome  space  is  (Q,  F  )  =  (Q.\  x  Q.2,  F\  x  F2).  For  8  =  (61,62)  e  0  the  "objective" 
product  probability  is  Qe  =  Qej  x  Qq2.  The  set  {(Q,  F,  Qe):  8  e  0}  is  the  family  of  objective  probability 

spaces. 

To  avoid  extraneous  complications  we  make  a  restrictive  independence  assumption  that  the  prior  belief 
\i  g  P(&)  can  be  decomposed  as  the  product  probability  [i\  x  (12  of  the  marginal  distributions.3  The 
induced  prior  probability  P^  on  (Q  x  0,  F  x  5)  is  defined  for  measurable  rectangles  A  x  B,  by  P^(A  x  B) 
=  |BQe(A)  ji(0)(d8)  for  A  e  F  and  B  efl.4  By  the  Product  Measure  Theorem  [1,  Theorem  2.6.2]  there  is  a 

unique  extension  of  P^.  to  (Q.  x  0,  F  x  B  ).  The  family  of  subjective  probability  spaces  is 
{(Q  x  0,  F  x  B,  P^):  \i  e  P(Q)}.  Slightly  abusing  notation,  the  marginal  distribution  of  P^  on  (Q,  F) 
and  (0, 5)  is  also  denoted  by  Pu.  So  for  example  for  A  e  F,  P^(A)  =  Pp.(A  x  0). 

III.3.  Histories  and  Policies 

A  precise  statement  of  the  optimization  problem  is  provided  in  this  section.  We  start  by  specifying 
what  constitutes  an  admissible  plan.  Included  in  the  definition  of  an  admissible  plan  are  the  count  functions 
Ck(t):  ft  — >  No,  k  =  1,2.  The  realization  Ck(t)(co)  represents  the  number  of  times  arm  k  has  been  chosen 
through  time  t. 

An  admissible  plan  is  defined  as  sequences  of  random  variables  {X(t)},  {Y(t)},  (Ci(t)},  and  (C2(t)}, 
and  a  sequence  of  sub-a-fields  Ht  such  that: 

\)H0  ={0,Q) 

ii)  X(0)  is  Ho  measurable, 

iii)  If  X(0)(co)  =  xk,  then  Y(0)(co)  =  coki, 

iv)  for  t  >  1,  Ht  =  Ht.j  v  o(X(t-l),  Y(t-l)), 

v)  X(t)  is  Ht  measurable, 

vi)  Ck(0)  =  Ik(X(0)),  and  Ck(t)  =  Ck(t-1)  +  Ik(X(t)),  and 

vii)  If  X(t)(co)  =  xk  and  Ck(co)  =  n,  then  Y(t)(co)  =  w^. 

A  policy  is  a  sequence  (X(t)}  for  which  there  exists  a  (unique)  corresponding  admissible  plan  [{X(t),  (Y(t)}, 
(Ci(t)),  (C2(t)},  [///}].  An  optimal  policy  for  the  prior  /i  maximizes 

■^Technically,  it  is  the  projections  (Proj(k,  n)),  defined  by  Proj(k,  n)(o>)  =  co^  which  are  independent. 

-'Note  the  contrast  with  the  assumptions  of  Rothschild  [23].  Rothschild  allows  for  the  arms  to  be  dependent  but 

requires  the  outcome  space  be  a  two  point  set. 
4For  the  integral  to  be  well-defined  it  is  necessary  that  the  map  9  — >  Qe(A)  be  measurable.  This  is  a  consequence  of 

[21,  Lemma  6.1]. 
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C  oo         2 
la1-  {  IIk(X(t)(o)))Rk(Y(t)(a)))}P^(d(o)  over  the  set  of  policies. 
t=0      k=l 


a 


III.4.  Posterior  Beliefs  and  The  Optimal  Bayesian  Policy 

When  possible,  beliefs  are  updated  according  to  Bayes  rule.  When  an  event  of  prior  probability  zero 
occurs,  the  updating  is  arbitrary.  Since  none  of  the  results  of  this  paper  depend  upon  the  choice  of  version 
of  conditional  probability,  without  loss  of  generality  we  shall  assume  that  the  decision-maker  ignores  any 
outcome  to  which  she  assigned  a  prior  probability  of  zero.  More  precisely,  the  posterior  map  rk:  /'(©k)  x 
Yk  ->  P(e\d>  k  =  1,  2  is  defined  by: 
For  all  A  e  Bk  , 

jek(yki)  Uk(d6k) 

TkOik,  Yki)(A)  =  — whenever    Jek(yki)  Wc(d9k)  >  0,  and 

Jek(yki)  Wc(dek)  ©k 

rk(|ik,  yidXA)  =  (ik(A)  whenever    Jek(yki)  |ik(d0k)  =  0. 

©k 

For  prior  [i  =  m  x  \x.2,  the    sequences  {vi(n,  M-i)}nro  and    {V2(n,  \^1)}^1q  of  updating  functions, 
vk(n,  ilk):  Q  -»  P(&k)  are  recursively  defined  by: 
vk(0,  ^((0)  =  nk, 

vk(n,  ^k)(co)  =  Tk(vk(n-1,  ^ik)(o)),  CDkn)  for  n  >  1. 
The  interpretation  is  that  vk(n,  Hk)(co)  is  the  posterior  measure  on  (©k,  B0  after  arm  k  has  been  chosen  n 
times  with  prior  measure  |ik. 

Associated  with  a  policy  (X(t)}  and  the  prior  belief  p.  =  |ii  x  112  is  a  sequence  of  posterior  maps 
[\x(i)}  ~r  The  map  n(t):  Q.  ->  P(&)  is  defined  by  n(t)(co)  =  m(t)((o)  x  |i2(t)(o))  where  ^ik(t)(co)  = 
vk(Ck(co),  nk)((0). 

A  policy  (X(t)}  is  stationary  if  there  exists  a  measurable  function  71:  P(Q)  — »  X  such  that  X(t)(co)  = 
^(|i(t)(u)))  for  all  0)  e  Q.  Identifying  a  function  71:  P(&)  -»  X  with  the  stationary  policy  it  induces,  no 
confusion  should  result  from  referring  to  k  as  a  stationary  policy. 

Invoking  now  well-known  results  (for  details  see  [9],  [15]  or  [21])  the  Bayesian  optimal  control 
problem  can  be  reformulated  as  a  dynamic  programming  problem  with  state  space  P(@)  and  value  function 
V:  />(0)  — »  R.  Furthermore,  there  exists  an  optimal  stationary  policy  [5,  Theorem  7(b)]. 

Drawing  on  results  of  Gittins  and  Jones  [14],  with  refinements  by  Berry  and  Fristedt  [2],  Ross  [22]  and 
Whittle  [27],  there  is  an  optimal  policy  with  a  sharp  characterization.  Gittins  and  Jones  proved  the  existence 
of  functions  Mi:  P(S\)  ->  R  and  M2:  ^(©2)  -*  R  widi  the  property  that  it  is  optimal  to  choose  arm  1  in 
period  t  iff  Mi(jii(t))  >  M2(M2(0)-  The  function  Mk  is  often  referred  to  as  the  Gittins  Index  for  arm  k. 

To  define  Mi  consider  a  one-armed  bandit  problem  where  in  the  initial  stage  the  decision  maker  has  the 
option  of  playing  arm  1  or  stopping  and  collecting  a  terminal  reward  of  m.  In  subsequent  stages,  if  arm  1 


has  been  played  in  all  previous  stages  then  the  options  remain  selecting  arm  1  or  stopping  and  receiving  a 
final  payment  m.  The  value  of  the  optimal  policy  for  beliefs  \x\  e  P(&\)  is  denoted  as  Vi(m,  m).  The 
Gittins  index  is  defined  by  Mi(jii)  =  inf{m:  Vi(jii,  m)  =  m).  M2  is  defined  analogously. 

Throughout  the  remainder  of  the  paper  we  assume  that  the  Bayesian  controller  follows  an  optimal 
stationary  policy  7t*:  P(@)  — »  X  defined  by: 

7t*(jii  x  \i2)  =  xi  if  Mi(m)  >  M2(H2).  and  k*(\h,  112)  =  X2  if  M2(^2>  >  Mi(m). 

For  a  more  detailed  exposition  the  reader  is  advised  to  consult  Ross  [22],  Whittle  [27],  and  especially  the 
monograph  of  Berry  and  Fristedt  [2]. 

IV.  GENERIC  LIMIT  THEOREMS 
IV.  1.  Some  Continuity  Results 

Preparatory  to  proving  the  genericity  theorems,  some  preliminary  technical  results  are  developed  in  this 
subsection.  The  principal  result  is  the  establishment  of  the  continuity  of  the  functions  Mj  and  M2. 

The  first  step  is  to  develop  characterizations  of  V\  and  V2  by  making  use  of  the  fact  that  the  value 
functions  Vk  are  solutions  to  the  optimality  equation.  The  expected  one-period  reward  for  choosing  action 

Xfc  with    beliefs    |ik  is    denoted    by    the    function    Ufc:  P(&k)  ~*    R   defined   by   Uk(M-k)  = 

p  00 

[  Z^k(yki)6k(yki)]  Mk(dQk)-  Informally  denoting  expectation  taken  with  respect  to  current  beliefs  |ik  by 
J   i=l 

E^k  and  denoting  future  beliefs  with  a  prime  symbol,  the  optimality  equation  can  be  written  as: 

Vk(Wc.  m)  =  max  (m,  Uk(|ik)+  aE^k[Vk(^k',  m)]}. 
To  make  the  above  precise,  it  necessary  to  provide  formal  definitions  of  EUk  and  the  mapping  from 

current  beliefs  to  probability  distributions  over  future  beliefs.  So  for  k  =  1,  2,  define  *Fk:  /'(©k)  -*  P2(@k) 


[  SlAC^kd^k.  yki))"6k(yki)]  Hk(dQk)-  Conditional  upon  period  t  beliefs  (ik(t)  and  arm 
i=l 


by  Vk(MkXA)  = 

©k 

k  being  selected,  vFk(jik(0)(A)  can  be  interpreted  as  the  probability  from  the  point  of  view  of  the  decision- 
maker that  Hk(t+1)  e  A.  It  will  also  be  useful  to  have  an  expression  for  the  conditional  probability  of 
realization  yki  given  that  arm  k  is  chosen  with  beliefs  fik-  So  we  define  the  functions  ^k:  ^k  x  f(©k). 
k  =  1,  2,  by  A-k(yki.  Uk)  =    J9k(yki)  Uk(dQk)-  We  can  now  prove  the  following  results. 

ek 

LEMMA  4.1.  The  maps  p.k  ->  rk(|ik>  yki).  k  =  1,  2  are  continuous  for  all  i  e  N. 
Proof.  The  proof  which  follows  from  routine  arguments  is  for  completeness  included  in  the 
Appendix.  ■ 

PROPOSITION  4.2.  ¥1  and  ¥2  are  continuous. 
Proof.  The  proof  is  in  the  Appendix.  ■ 
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The  next  step  is  to  establish  the  continuity  of  Vi  and  V2.  The  boundedness  of  the  functions  R\  and 
R2  is  necessary  for  this  result.  Berry  and  Fristedt  [2,  p.42]  provide  a  counterexample  to  continuity  when  the 
reward  function  is  unbounded. 

LEMMA  4.3.  Vj  and  V2  are  continuous  functions. 

Proof.  It  suffices  to  prove  that  Vj  is  continuous.  Let  KcR  be  compact  and  define  Ek  = 
C(/>(©i)  x  K).  Define  TK:  EK  ->  EK  by  TKf(^i,  m)  =  Max  {m,  U1O11)  +  a-Jf^i',  m)  *iOli)(djllO). 

Note  that  Uj  is  bounded  since  R\  is  bounded.  Invoking  standard  arguments,  it  is  easily  confirmed  that  T^ 
is  a  contraction  mapping  with  fixed-point  <J>k:  P(©l)  x  K.  Since  <J>k  is  the  solution  to  the  optimality 
equation,  it  is  also  the  restriction  of  Vj:  P(&\)  x  R  ->  R  to  P(&\)  x  K.  Since  4>k  is  continuous,  V]  is 
continuous  on  every  compact  set  and  by  extension  W\  is  continuous.    ■ 

The  expected  loss  from  stopping  when  arm  k  is  the  only  arm,  is  given  by  the  function  L^:  /'(©k)  x  R 
—»  R  defined  by  Lk(|ik-  m)  =  Vk(Wo  m)  ■  m-  The  following  Lemma  collects  some  well-known  results. 

LEMMA  4.4.  For  k  =  1,2: 
(i)  the  function  V^Oik,  •)  is  increasing  and  convex,  and 
(ii)  the  function  Lk(|!k,  )  is  decreasing. 

Proof,  (i)  See  [20,  Lemma  VII.3.1],  (ii)  Replacing  sums  with  integrals,  the  proof  provided  in  [20, 
Lemma  VII. 2.1]  remains  valid.  ■ 

Now  consider  the  game  for  which  the  decision-maker  must  initially  play  arm  k,  and  thereafter  chooses 
between  continuing  to  select  arm  k  or  permanently  retiring  with  payoff  of  m.  The  value  of  pursuing  the 
optimal  strategy  for  this  game  is  given  by  the  function  Wk:  ^(©k)  x  R  -»  R  defined  by  Wk(Mk>  m)  = 
Uk(M-k)  +  ctjVk(|ik,  m)  ^(M^^Wc)-  ^Vith  the  following  Lemma  we  prove  that  the  set  of  terminal 

rewards  m,  for  which  the  decision-maker  is  indifferent  between  continuing  play  (and  possibly  stopping  at 
some  future  date  with  reward  m  received  at  that  future  date)  and  stopping  immediately  is  a  singleton. 

LEMMA  4.5.  (i)  Wi  and  W2  are  continuous,  (ii)  If  Mk(Uk)  <  m.  then  WQik,  m)  <  m. 

Proof,  (i)  The  proof  follows  from  Proposition  4.2  and  Theorem  5.5  of  [3]. 
(ii)  Suppose  Mk(jik)  =  m  <  m.  Since  m  is  a  terminal  payoff  for  which  the  controller  is  indifferent  between 
continuing  playing  arm  k  and  receiving  payoff  m,  Wk(jik,  m)  =  in  =  Vk(jlk>  m)-  According  to  Lemma  4.4 
for  all  |ik  e  /'(©k),  Vk(p.k.  m)  -  vk(Wo  m)  <  (m  -  m),  implying  that 
J[vk(Wc.  m)  -  Vk(nkl  in)]  *Fk(Wc)(dWc)  ^  (m  -  m).  Finally,  note  that 

Wk0ik,  m)  =  UkQlk)  +  a/VkOlfc,  m)^k(Mk)(d^ik) 
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=  Uk(Wc)  + 

«•  {/VkOik,  m)¥k(£kXd|ii)  +  [Jvk(p-k.  m)  n^kKdUk)  -  Jvk(nk.  m)  ^k^kXdpk)] } 
=  m  +  a-fJVkOlfc  m)  4>k(^k)(dWc)  -  JVk(Hk.  m)4>k(iIk)(dHk)] 

<  m  +  a(m  -  m) 

<  m.     ■ 

The  proof  of  continuity  of  Mi  and  M2  is  now  completed  by  verifying  that  these  functions  are  both 
upper  semicontinuous  (u.s.c.)  and  lower  semicontinuous  (l.s.c). 

PROPOSITION  4.6.  Mi  and  M2  are  continuous. 

Proof.  First  we  establish  that  Mk  is  l.s.c.  by  verifying  that  for  all  c  6  R  the  set 
(|ik  g  P(@k)'-  Mk(Uk)  >  c)  is  open.  Suppose  that  Mk(ilk)  >  c.  So  by  definition  of  Mk,  Vk(£k.  c)  -  c  = 
Lk(Mk.  c)  >  0.  From  the  continuity  of  Lk  (Lemma  4.3),  3  an  open  neighborhood  N^  of  Hk  such  that  for 
all  ^k  e  Nk>  Lk(Wc.  c)  >  0.  Therefore,  for  all  |ik  e  N/c,  Vk(Hk>  c)  -  c>  0  and  Mk(lik)  >  c. 

Upper  semicontinuity  is  confirmed  by  demonstrating  that  the  set  {|ik  s  /'(©k):  Mk(Wc)  <  c)  is  open 
for  all  c  e  R.  Suppose  that  Mk(Mk)  =  m  <  c.  By  Lemma  4.5,  Wk(jlk>  c)  <  c  and  3  an  open  neighborhood 
J/c  of  jik  such  that  for  |ik  e  Jk  .  Wk(Hk.  c)  <  c.  But  this  implies  that  Mk(M-k)  <  c.    ■ 

IV .l.Generic  Outcomes  when  c\  *  C2 

In  this  subsection  we  analyze  the  limit  outcomes  in  the  case  when  ci  *  C2  (Recall  that  Ck  = 
inf  (Rk(yki):  i  e  N}).  Without  loss  of  generality  we  assume  that  ci  <  C2-  To  understand  the  next  result, 
select  i*  €  N  such  that  Rl(yii*)  <  C2-  So  if  the  decision-maker  attaches  sufficiently  high  probability  to 
yii*  occurring  if  arm  1  is  selected,  arm  2  will  be  selected  regardless  of  |i2(t).  Furthermore,  the  Proposition 
of  Freedman  [13]  stated  below  implies  that  for  a  residual  set  of  (81,  M-i(O))  pairs,  the  probability  of  playing 

arm  1  infinitely  often  with    J0i(yii*)  |ii(t)(d0i)  bounded  away  from  one,  is  zero. 

©1 

PROPOSITION  4.7  [Freedman].  For  k  =  1,2,  the  set  Lk  of  pairs  (9k,  Hk(0)}  c  0k  x  /'(©k)  wiln  lim 
supt->«    jHk(0(tok)(U)  Qek(dC0k)  =  1  simultaneously  for  all  nonempty  open  subsets  <?/@k  W  residual. 
Qk 

So  for  a  residual  set  Ii  c  ©1  x  P(Q\),  if  arm  1  is  played  sufficiently  often  the  decision-maker  will 
eventually  (Qe  a.s.)  be  sufficiently  pessimistic  regarding  the  realization  9i  that  arm  1  will  never  be  tested 
again. 
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PROPOSITION  4.8.  Suppose  c\  <  C2-  Then  there  exists  a  residual  subset  1\  c  0j  x  P(Q\)  such  that 

for   (9i,  m(0))  €  Zi  and  all   (02-H2(O))   e  02  x />(02),   Qe  a.s.   X(t,  m(0)  x  n2(0))(co)  =  xk  only 

finitely  often. 

Proof    For  i  e  N,  define  5?  e  P(@\)  by  5?(B)  =  1  if  Si  €  B,  else  6?(B)  =  0.  (\i\  =  5?  corresponds  to 

the  belief  that  outcome  i  occurs  with  probability  one  whenever  action  x\  is  chosen.)  Choose  i*  e  N  s.t. 

Rl(yii*)  <  c2-  Observing  that  MiCSj)  =  — 'r-and  recalling  that  Mj  is  continuous  (Proposition  4.6), 

2                                             c2 
there  exists  a  neighborhood  U  of  5j  s.t.  for  m  6  U,  Mi(jii)  < :  Note  that  if  ni(t)(to)  e  U,  then  for 

all  t'  >  t,  Xt(co)  =  X2.  The  to  set  for  which  vi(n,  M-l)(<u)  is  never  in  U  is  denoted  as  Di(m)  = 
{co:  vi(n,  m)(a>)  e  U.  n  =  1,  2,  ... }.  Define  Ii  =  {(6,  m(0)):  Qe(DiGii))  =  0).  By  Proposition  4.7,  Si 

is  residual. 

For  (0,  |ii(0))  g  Zi,  define  n*(co)  =  inf{n:  vi(n,  |ii(0))(to)  e  U}.  n*  is  Q0  a.s.  finite  and  so  Q0  a.s., 

for  all  t  the  count  function  Ci(t)(co)  <  n*(co)  <  °o.  ■ 

IV. 3.  Generic  Outcomes  when  ci  =  C2 

ci 
To  understand  the  next  result  suppose  that  ci  =  c2  and <  Mi(|ii(0))  <  M2(M-2(0))-  Then 

generically,  Qe  a.s.  there  will  exist  some  time  t,  say  ti,  such  that  M2(|i2('cl))  <  Ml(m(0))-  Then 
reversing  the  roles  of  the  arms,  eventually  at  some  time  x2  >  Ti,  Mi(|ii(T2))  <  M2(p-2(Tl)).  etc-  And  so 
each  arm  will  be  chosen  infinitely  often.  Observe  that  there  exist  consistent  estimators  of  (0i,  02)  and  so 
from  the  point  of  view  of  a  classical  statistician,  there  will  eventually  be  overwhelming  evidence  as  to 
which  arm  offers  superior  prospects. 

PROPOSITION  4.9.  Suppose  that  c\  =  C2  and  that  the  range  of  Rfc,  k  =  1,  2,  is  not  a  singleton.  Then 
there  is  a  residual  subset  9*  c  0  x  P(@)  such  that  for  (8i,  02»  Hl»  M-2)  e  ^»  with  Qe  probability  one 
X(t,  p.(0))  =  xi  infinitely  often  and  X(t,  |i(0))  =  X2  infinitely  often. 

Proof.  For  (i  e f(0)  define  the  set  Ei(fi)  =  {co:  X(t,  |i(0))  =  X2  finitely  often}.  The  strategy  is  to  first 
prove  that  there  exists  a  residual  subset  9? i  c  0  x  /*(©)  such  that  for  (9i,  02,  M-i,  M-2)  €  9?i, 
Q9(Zi(|i))  =  0.  Defining  E2  analogously,  by  an  identical  argument  there  exists  a  residual  subset 
9*2  c  0  x  />(©)  such  that  for  (0i,  02,  M-i,  U2)  6  ^2.  Q0(-2(H-))  =  0-  Since  the  countable  intersection  of 
residual  sets  is  a  residual  set,  9?  defined  by  9*  =  9*  i  n  9*2  is  residual. 

We  now  verify  the  existence  of  such  a  set  9*i.  For  k  =  1,  2,  define  0£  =  {0k  e  0k:  V  i  e  N, 
6k(yki)>0}.  0£  is  residual  [13,  Remark  I].  The  sets  Ak  c  f(0k)  defined  by  A  k  = 
{^k  e  P(Q\d-  Mc(6f)  >  0)  are  also  residual  [13,  Remark  2],  and  so  A  =  Ai  x  A2  is  residual  [20, 
Theorem  15.3]. 
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For  n  g  N,  3  i(n)  such  that  Ri(yii(n))  <  cl  +  n_1  and  hence  Mi(5j/2n))  < — ■  By  the  continuity 


9                                                         ci  +  n 
of  Mi,  there  exists  a  neighborhood  Nn  of  SjAO  such  that  for  m  g  Un,  M(jii)  <— ■ :  Define  N  to  be 

a  family  of  such  neighborhoods.  So  if  |I2(0)  g  A2,  then  for  all  (o  e  LI  and  V  n'  g  N,  3  jVn  g  N  s.t. 

sup     [MiOii)]  <  M2(v2(n',  co)). 
meNn 

Define  #(|ii)  =  {(o  6  Q:  3  n(co)  s.L  for  all  n  >  n((u),  vi(j,  m)(G))  g  Un  for  only  finitely  many  j}. 

Define  9*i  =  [(]i,  9):  \i  g  A,  Q0(H(m,  H2))  =  0  and  observe  that:  (i)  for  0  =  (6i,  92),  Q0(#(Hl))  does  not 

depend  upon  02,  and  (ii)  for   p.  =  (m,  J12)  g  A,  Z(hi,  112)  c  i3(|ii).  A  direct  consequence  is  that 

9tl3{(M):neA,QeO&(ni))  =  0}. 

The  proof  is  completed  by  confirming  that  [(p.,  0):  p.  g  A,  Q0(i3(Hl))  =  0)  is  residual  and  hence  9*  1  is 
residual.  Define  K  1  =  {(]ii,  0i)  g  P(Q\)  x&\:  Q0(t3(jii))  =  0}.  According  to  Proposition  4.7,  K  1  is 
residual.  By  application  of  the  category  analog  of  Fubini's  theorem  [20,  Theorem  15.3],  A]  x  0]  is 
residual,  and  so  K  1  n  (Ai  x  ©1)  is  residual.  Again  invoking  [20,  Theorem  15.3],  {(m,  ^2»  6l»  02): 
(HI,  0i)  g  N  1  n  (Ai  x  @i),  |i2  g  A2)  is  residual.  And  by  definition,  (\i\,  \\.2,  01,  02):  (M-l,  ^l)  e 
X  1  n  (Ai  x  0!),  w  e  A2}  =  (Oi.  0):  [ie  A,  Qe(d(m))  =  0}.  ■ 

V.  CONCLUDING  REMARKS 

It  is  natural  to  inquire  as  to  the  generality  of  Propositions  4.7  and  4.8.  Is  there  some  peculiarity 
associated  with  having  a  countable  outcome  space  that  underlies  these  nonintuitive  propositions?  In  cases 
when  the  parameter  space  is  a  smooth,  finite  dimensional  set  (as  in  [10],  [16],  and  [19])  one  could  hope  to 
modify  the  theorems  of  Schwartz  [24]  and  prove  a.s.  convergence  of  beliefs  for  all  0  g  0.  (This  work 
though,  remains  to  be  undertaken.)  But  the  results  of  Freedman  [13]  and  of  this  paper  can  be  extended  to 
other  infinite  dimensional  settings.  For  instance,  suppose  for  k  =  1,  2  the  outcome  space  Yk  =  [0, 1].  Then 
if  ©k  is  the  set  of  all  density  functions  with  the  Li[0,  1]  topology,  with  minimal  modification  all  proofs 
remain  valid.  Furthermore,  I  conjecture  that  the  restriction  that  the  decision-maker  choose  form  a  finite  set 
of  actions  is  inessential.  So  while  a  complete  answer  is  not  yet  available,  it  appears  that  the  only  crucial 
assumption  is  that  the  parameter  space  0  is  infinite  dimensional. 

To  assess  the  significance  of  the  results  contained  in  this  paper  with  regard  to  economic  modelling  with 
Bayesian  learning,  requires  addressing  die  issue  of  the  "size"  of  the  residual  set  R.  The  customary  practice 
is  to  equate  residual  with  "large"  (even  though  in  Rn  there  are  residual  sets  of  Lebesgue  measure  zero).  If 
this  equivalence  is  accepted  and  if  prior  beliefs  are  arbitrarily  chosen,  then  one  is  lead  to  conclude  as  did 
Freedman  [13],  "for  essentially  any  pair  of  Bayesians,  each  thinks  the  other  is  crazy." 

An  obvious  objection  to  this  stark  prediction  is  that  there  is  little  basis  for  automatically  classifying 
residual  sets  as  "large".  On  the  other  hand,  there  are  no  grounds  for  asserting  that  the  "bad"  pairs  of 
parameters  and  priors  are  in  any  sense  small.  A  more  cautious  interpretation,  relying  only  on  the  density  of 
91  and  of  the  Dirichlet  priors,  would  stress  the  sensitivity  of  asymptotic  beliefs  to  the  prior.  In  particular,  it 
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is  illegitimate  to  restrict  attention  to  a  computationally  convenient  class  of  priors  (even  when  dense)  and 
presume  mat  any  resulting  limit  theorems  extend  beyond  the  initial  family  of  distributions. 

A  second  point  of  contention  with  the  remark  of  Freedman,  is  that  it  may  be  appropriate  to  impose  a 
priori  restrictions  on  the  family  of  admissible  prior  beliefs.  Freedman  himself  now  appears  to  support  this 
stance.  Diaconis-Freedman  [7]  advocate  that  Bayesian  statisticians  adopt  what  they  label  the  "what  if 
method.  This  consists  of  "...  after  specifying  a  prior  distribution  generate  imaginary  data  sequences, 
compute  the  posterior,  and  consider  whether  the  posterior  would  be  an  adequate  representation  of  the  updated 
prior."  Adapting  this  reccommendation  to  the  context  of  economic  modelling,  it  would  be  natural  to  require 
that  agents  have  prior  beliefs  with  full  support  and  that  the  sequence  of  agent  posterior  beliefs  converges 
almost  surely  with  respect  to  the  measure  Qe  for  all  9  e  0. 

While  this  author  is  sympathetic  to  imposing  such  restrictions,  these  auxiliary  conditions  are  ad  hoc; 
Axioms  of  Bayesian  decision  theory  provide  no  foundation  for  such  constraints  on  prior  beliefs. 
Furthermore,  while  the  Dirichlet  priors  constitute  an  example  of  such  priors  within  the  specific  framework 
of  this  paper,  the  existence  of  such  priors  for  more  general  learning  models  (such  as  in  [8])  has  not  yet  been 
verified.  Further  research  on  this  issue  is  needed. 

Appendix 

Proof  of  Lemma  4.1.  It  suffices  to  prove  that  Ti(-,  y\\)  is  continuous.  To  simplify  the  notation  for  this 
proof,  representative  elements  of  Y\,  &\,  and  P(S\)  are  denoted  respectively  as  yj,  9,  and  |i.  Let  F  c  ©i 
be  a  closed  set  and  suppose  u.n  =>  p.  e  P(@\).  Applying  a  standard  characterization  of  weak  convergence  [B, 
Theorem  2.1],  it  suffices  to  show  that  limn  ri(jj.n,  yi)(F)  <  ri(^t,  yi)(F).  Or  equivalently  that  the  map  \i 
->  T\(p.,  yi)(F)  is  upper  semicontinuous  u.s.c. 

/9(y0H(d9) 

F 

Recall  that    T\([i,  yi)(F)  = :  And  note  that  since  the  map  9  — >  9(yj)  is  bounded  and 

J9(y0p.(d9) 

ei 

continuous  the  map  \i  — >    J9(y0  |i(d9)  is  continuous.  So  the  proof  will  be  completed  by  confirming  that 
©1 

the  map  |i  -»  JQ(y\)  |i(d9)  u.s.c. 
F 

The  map  9  -»  lF(9)-9(yi)  is  u.s.c,  and  by  [3,  Exercise  7,  p.  17]  the  map  |i  ->    JlF(9)-9(y0  |i(d9)  is 

01 

u.s.c.  But  J9(y0  p_(d9)  =    JlF<9)-9(y0  u.(d9),  and  so  the  map  p.  -»  J9(y0  p(d9)  u.s.c.  ■ 
F  0!  F 

Proof  of  Proposition  4.2.  It  suffices  to  prove  that  *¥\  is  continuous.  As  in  the  proof  of  Lemma  4.1,  for 
notational  simplicity  representative  elements  of  Yi,  0j,  and  P(&\)  are  denoted  respectively  as  yj,  9,  and  \L 
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Let  f:  P(&\)  -*  R  be  a  bounded  continuous  function  and  {\in}  a  sequence  with  |in  =>  p..  To  establish 

continuity  of  4*1,  we  must  verify  that      /f(j_i')  *¥ \ (jin)(dpO  -»      Jf(ji')  vFi(iI)(dM.')>  or  equivalently  that 

P(S)  P(0) 

oo  oo 

f[^f(TiOi,  yi)-e(yi))]  Hn(d0)  ->  J[]^f(riOi.  yO-eCyj))]  ji(d9). 
ei=l  ei=l 

oo 

Define  the  functions  g:  0i  ->  R  and  gn:  ©i  ->  R  by  g(0)  =  £f(T(Ji  yOHCyi)  and  gn(0)  = 

i=l 

oo 

Xf(r(nn,  yO)-Q(yi)-  Since  the  maps  y,  ->  f(r(ji  y,))  and  y,  ->  fCr^^  y,))  are  bounded  and  continuous,  g 
i=l 

and  gn  are  continuous.  Furthermore,  as  is  now  demonstrated,  gn(0)  -*  g(0),  ji  a.s..  Define  X^  e  P(Y\)  by 

oo 

XU(A}  =  SlACyD'Myii  H)  f°r  Ac  Yj.  The  support  of  fi  is  contained  in  Ilc8i,  defined  by  n  = 
i=l 

(9e  0i:  0  «  Xji).  For  0  e  n,  0(y,)  >  0  implies  that  T(-,  yj)  is  continuous  at  jl  [Lemma  4.1],  and 
hence  ffJX-,  yO)  is  continuous  at  jl.  Denoting  the  Ue  norm  of  f  by  llflU,     Mlflleo  0(dyi)  <  °°.  By  a  standard 

reult  on  the  preservation  of  continuity  under  integration  [4,  Theorem  16.8],  for  0  e  n  the  map 

oo 

\i  -»    £f(T(n,  yi))0(yi)  is  continuous  at  £  and  so  gn(0)  ->  g(0). 
i=l 

Finally,  |gn(0)  Md0)  ->  Jg(9)  p(d0)  by  [3,  Theorem  5.5],  or  equivalently 

e  e 


£f(TiOi,  yi)-0(yi))]  lin(d0)  -»  J[^f(TiOl,  yi)-0(yi))]  H(d0). 
i=l  0  i=l 
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